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Abstract 

This paper deals with the problem of estimating the covariance matrix of a series of independent 
multivariate observations, in the case where the dimension of each observation is of the same order as 
the number of observations. Although such a regime is of interest for many current statistical signal 
processing and wireless communication issues, traditional methods fail to produce consistent estimators 
and only recently results relying on large random matrix theory have been unveiled. 

In this paper, we develop the parametric framework proposed by Mestre, and consider a model 
where the covariance matrix to be estimated has a (known) finite number of eigenvalues, each of it with 
an unknown multiplicity. The main contributions of this work are essentially threefold with respect to 
existing results, and in particular to Mestre's work: To relax the (restrictive) separability assumption, to 
provide joint consistent estimates for the eigenvalues and their multiplicities, and to study the variance 
error by means of a Central Limit theorem. 

I. Introduction 

Estimating the covariance matrix of a series of independent multivariate observations is a crucial issue 
in many signal processing applications. A reliable estimate of the covariance matrix is for instance needed 
in principal component analysis JT), direction of arrival estimation for antenna arrays O, blind subspace 
estimation j3), capacity estimation H, estimation/detection procedures 0, Q, etc. 

In the case where the dimension N of the observations is small compared to the number M of 
observations, then the empirical covariance matrix based on the observations often provides a good 
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estimate for the unknown covariance matrix. This estimate becomes however much less accurate, and 
even not consistent with the dimension N getting higher (see for instance [6, Theorem 2]). 

An interesting theoretical framework for modern estimation of multi-dimensional variables occurs 
whenever the number of available samples M grows at the same pace as the dimension N of the 
considered variables. Shifting to this new assumption induces fundamental differences in the behavior of 
the empirical covariance matrix as analyzed in Mestre's work 0, Q. Recently, several attempts have 
been done to address this problem (cf. O, Q, El, O) using large random matrix theory which proposed 
powerful tools, mainly spurred by the G-estimators of Girko ifTUll . to cope with this new context. This 
was for instance the main ingredient used in [8] and iTTTTl . where grid-based techniques for inverting the 
Marcenko-Pastur equation were proposed. 

In this article, we shall consider the case where the dimension of each observation N together with the 
sample dimension M go to infinity at the same pace, i.e. that their ratio converges to some nonnegative 
constant c > 0. In order to present the contribution provided in this paper, let us describe the model 
under study. 

Consider a NxM matrix X^r = (Xij) whose entries are independent and identically distributed (i.i.d.) 
random variables. Let Rjy be a N x N Hermitian matrix with L (L being fixed and known) distinct 
eigenvalues < p\ < • • • < pL with respective multiplicities N\, ■ ■ ■ , Nl (notice that J2i=i = -^0- 
Consider now 

Yjy = R^ 2 Xjy • 

The matrix Y^r = [yi,--- ,Ym] is the concatenation of M independent observations, where each 

1/2 

observation writes yi = Xi with Xjv = [xi, • • • , xjvfj. In particular, the covariance matrix of each 
observation y, is R^r = Ey,y^ (matrix R^ is sometimes called the population covariance matrix). 

We consider the problem of estimating individually the eigenvalues pi as well as their multiplicities 
Ni. Among the proposed parametric techniques, we cite the one developed by Mestre [7] and taken up 
by Vallet et al 11121 and Couillet et al lfl"3l for more elaborated models. Although being computationally 
efficient, this technique requires a separability condition, namely the assumption that the number of 
samples is large compared to the dimension of each sample (small limiting ratio c = lim -jX > 0). In 
such a case, the limiting spectrum of the empirical covariance matrix possesses as many cluster^s there 
are eigenvalues to be estimated, and each eigenvalue can be estimated by a contour integral surrounding 
the related cluster. Mestre's technique cannot be applied anymore in the case where c is larger (which 

'By cluster, we mean a connex component of the support of the limiting probability distribution of the spectrum. 
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reflects a higher dimension of the observation dimension with respect to the sample dimension). In fact, 
the dimension of the clusters may grow and neighbouring clusters may merge, violating the one-to-one 
correspondence between clusters and eigenvalues to be estimated (see for instance Fig. [T] and [2]). 



- Asymptotic spectrum 
Empirical eigenvalues 



3 10 
Eigenvalues 




Eigenvalues 



Fig. 1. Empirical and asymptotic eigenvalue distribu- Fig. 2. Empirical and asymptotic eigenvalue distri- 

tion of Rjv for L = 3, pi = 1, p2 = 3, p3 = 10, bution of Rjv for L = 3, pi = 1, p2 = 3, p3 = 5, 

N/M = c = 0.1, N = 60, Nx = N 2 = 7V 3 = 20. N/M = c= 3/8, N = 30, N x = N 2 = N 3 = 10. 

A way to circumvent the separability condition has recently been proposed by Bai, Chen and Yao |[T4ll . 
based on the use of the empirical asymptotic moments: 

d fc = ^TY (Y N Y N ) k ,ke {l,--- ,2L}, 

which can be shown to be a sufficient statistics to estimate (%■,••• ,j^,Pir~' iPl)- Although being 
robust to separability condition, this technique suffers from numerical difficulties, since the proposed 
estimator has no closed-form expression and thus should be determined numerically. An interesting 
contribution, although not directly focused on estimating the covariance of the observations is the work 
by Rubio and Mestre lfT31 . where an alternative way to estimates the moments 

lk = ^Tr(R^), 

for all k G N is proposed, yielding an explicit (yet lengthy) formula. 

In this paper, we improve existing work in several directions: With respect to Mestre 's seminal papers 
@, 0, we propose a joint estimation of the eigenvalues and their multiplicities, and drop the separability 
assumption. The proposed estimator is close in spirit to the one in lfl4l . although we carefully establish 
the existence and uniqueness of the estimator, a fact that is not explicit in lfl4l (we shall also mention a 
close ongoing work by Li and Yao, not yet disclosed to our knowledge) . Finally, we study the fluctuations 
of the estimator and establish a Central Limit theorem. 
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The remainder of the paper is organized as follows. In Section [TTJ the main assumptions are provided 



and Mestre's estimator Q is briefly reviewed. In Section III the proposed estimator is described. Its 



fluctuations are studied in Section |IVJ where a central limit theorem is stated. Simulations are presented 
in Section [Vj and a discussion ends the paper in Section VI Finally, the remaining technical details are 
provided in the Appendix. 



II. Main assumptions and general background 

A. Notations 

In this paper, the notations s,x, M stand for scalars, vectors and matrices, respectively. Superscripts 
(•) T and (-) H respectively stand for the transpose and transpose conjugate; trace of M is denoted by 
Tr(M); determinant of M, by det(M); the mathematical expectation operator, by E. If z G C, then ^St(z) 
and Ss(z) respectively stand for z's real and imaginary parts, while i stands for z stands for z's 

conjugate. 

If Z G c^xN is a nonnegative Hermitian matrix with eigenvalues (&; 1 < i < N), we denote in the 
sequel by F z the empirical distribution of its eigenvalues (also called spectral distribution of Z), i.e.: 

1 N 
1=1 

where S x stands for the Dirac probability measure at x. 

d y 

Convergence in distribution will be denoted by — >, in probability by — >■; and almost sure convergence, 

. a.s. 

by — >. 



B. Main assumptions 
Consider the model 

and 



At first, an assumption about the matrix Rjy is needed: 

Assumption 1: RjyisaA^xA^ Hermitian non-negative definite matrix with L (L being fixed) distinct 
eigenvalues < pi < • • • < pi with respective multiplicities A r i, • • • , Nl (notice that Y2i=i Ni = N). 
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As mentioned earlier, we consider the asymptotic regime where the number of samples M and the 
number of variables N grow to infinity at the same pace, together with the multiplicities of each of R^r's 
eigenvalues. 

Assumption 2: Let M, N be integers such that: 

N,M -►oo, with — -> c G (0, oo) , and -> Cj G (0, oo) , 1 < i < L. (1) 

This assumption will be shortly referred to as N, M — > oo. 

The following assumption is standard and is sufficient for estimation purposes. 

Assumption 3: Let Xjv = (Xij) be a N x M matrix whose entries are i.i.d. random variables in C 
such that E(Xi,i) = 0, E(|Xi,i| 2 ) = 1 with finite fourth moment: E(|Xi,i| 4 ) < oo. 

Remark 1: In order to establish the fluctuations of this estimator, the Gaussianity of the entries of Xjv 
is needed (although this technical condition may be removed with substantial extra work). 

Assumption 3b: The entries of the N x M matrix X^v = {Xij) are i.i.d. standard complex Gaussian 
variables, i.e. = U + iV, where U, V are both independent real Gaussian random variables N(0, |). 

It is well-known in large random matrix theory that under Assumptions [I] [2] and [3] F^ N converges 
to a limiting probability distribution. In Mestre's paper (71, a separability conditioj^is needed in order 
to derive the estimator of Rjy's eigenvalues: 

Assumption 4: The support S of the limiting probability distribution of F Rlv is composed of L 
compact connex disjoint subsets, and not reduced to a singleton. 

Remark 2: Note that when M < N, matrix R^r is singular and thus admits (N — M) eigenvalues 
equal to zero. Hence, the limiting spectrum of R^r has an additional mass in zero with weight 1 — i, 
which will not be considered among the L clusters. 

The separability condition is illustrated in Fig. [T] and [2] In both figures, the limiting distribution of 
pA N j s drawn (red line). In Fig. [lj R^'s eigenvalues are pi = 1, pi = 3, ps = 10, they have the 
same multiplicity and the ratio c is equal to 0.1. In this case, the separability condition is satisfied as the 
limiting distribution exhibits 3 clusters. The separability condition is no longer satisfied in Fig. [2| where 
pi = 1, p2 = 3, p3 = 5 and c = 0.375, but where the limiting distribution only exhibits a single cluster. 

2 The precise technical statement of the separability condition together with a mathematical interpretation are available in (7), 
but are not necessary here. 
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C. Background on Large Random Matrices, Mestre 's estimators and their fluctuations 

The Stieltjes transform has proved since Marcenko and Pastur's seminal paper |[T6l to be extremely 
efficient to describe the limiting spectrum of large dimensional random matrices. Given a probability 
distribution P defined over M + , its Stieltjes transform is a C-valued function defined by: 

mp (z) = [ ^ , z G C\R + . 
JR+ X — z 

In the case where F z is the spectral distribution associated to a nonnegative Hermitian matrix Z G C NxN 
with eigenvalues (&; 1 < i < N), the Stieltjes transform mz of F z takes the particular form: 

F z (d\) 



m z {z) 



X-z 

N 



1 



i=l Si 

which is exactly the normalized trace of the resolvent (Z — zl^)^ 1 - 

An important result associated to the model under investigation here is Bai and Silverstein's description 
of the limiting spectral distribution of R^r ifTTl (see also lfT6l ): 

Theorem 1: ifTTll Assume that Assumptions [l] [2] [3] hold true and denote by _F R the limiting spectral 
distribution of Rjv, i.e. F R (dX) = Ylk=i c k^p k {dX). The spectral distribution F RjY of the sample 
covariance matrix R^v converges (weakly and almost surely) to a probability distribution F as M, N — > 
00, whose Stieltjes transform m(z) satisfies: 

, , 1 , , / 1 \ 1 

m(z) = -miz) — 1 — , 

c \ cj z 

for z G C + = {z G C, 9(2;) > 0}, where m(z) is defined as the unique solution in C + of: 

^ ) = -( z - c /rTib) dFR<t) ) 

Remark 3: Note that m(z) is also a Stieltjes transform whose associated distribution function will be 
denoted F, which turns out to be the limiting spectral distribution of F— N where is defined as: 

RjV — ^X^RatXtv • 

Remark 4: Denote by m-R N {z) and (z) the Stieltjes transforms of F Rjv and F— N . Notice in 
particular that 

M , . / M\ 1 
Rw iV — « y N J z 
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Remark 5: Denote by mj^(z) and m N (z) the finite-dimensional counterparts of m(z) and m(z), 
respectively, denned by the relations: 

rn N (z) = -(z-^J j+^dF^ (i)) ~\ 
m N {z) = § rn N (z) - (l - f ) ± . 

It can be shown that and m N are Stieltjes transforms of given probability measures Fjy and F_ N , 
respectively (cf. lPT8l Theorem 3.2]). 

In Q, Mestre proposes a novel approach to estimate the eigenvalues (p^; 1 < k < L) of the population 
covariance matrix based on the observations R^ under the additional Assumption [4] His approach relies 
on large random matrix theory and the separability condition presented above plays a major role in the 
mere definition of the estimators. As it will be a useful background in the sequel, we provide hereafter 
a brief description of Mestre's results: 

Theorem 2: Q Denote by Ai < • • • < \n the ordered eigenvalues of R^. Under Assumptions [l] [2] 
[3] |4] and assuming moreover that the multiplicities N\, - ■ ■ , Nl are known, the following convergence 
holds true: 

h - Pk > , (4) 

where 

with N fe = {£JZ} Nj + 1, . . . , £ U Nj} and 

Al < • • • < fiN the (real and) ordered solutions of: 

repeated with their multiplicites. When N > M, we use the convention jx\ = ■ ■ ■ = /ijv-Af+l = 0) 
whereas pn-m+2, ■• • j AiV contain the positive solutions to the above equation. 

Remark 6: Notice that ([6]) associated to Q readily implies that for non null /tj, = 0. 

Otherwise stated, the Aj's are the zeros of m-n . This fact will be of importance in the sequel. 

fijv 

Sketch of proof : We can now describe the main steps of Theorem [2] By Cauchy's formula, write: 



where is a positively oriented (clockwise) contour taking values on C\{pi, • • • , pl} and only enclosing 
Pk- With the change of variable w = an d the condition that the limiting support S of the 
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eigenvalue distribution of R^v is formed of L distinct clusters 1 < k < L) (cf. Figure [I]), we can 
write: 

M f m'(z) , 

where denotes positively oriented contours which enclose the corresponding clusters Defining 

M f m 'n ^ 

p k A — _ I z - N - dz , 1 < Jfc < L , (8) 



2mN k J &k m^(z) 

dominated convergence arguments ensure that pk — Pk ^ 0, almost surely. The integral form of pk can 
then be explicitly computed thanks to residue calculus, and this finally yields ([5]). □ 

Recently a central limit theorem has been derived |[T9l for this estimator under the extra assumption 
that the entries of X^r are Gaussian . 

Theorem 3: lfT9ll With the same notations as before, under Assumptions [TJ [2j 3b, [4] and with known 
multiplicities N\, - ■ ■ , Nl, then: 

(M(p k -p k ), l<k<L) x ~ Nl(0, 0) , 

M,N— too 

where 'Nl refers to a real L-dimensional Gaussian distribution, and is a L x L matrix whose entries 
Q k £ are given by, 



9 



rn' (zi)m' (Z2) 1 



dz\dz2 , 



m(zi)m(z2) 



4-K 2 c 2 c k ce 7 efc / e< [(m(zi) - m(z 2 )) 2 (z 1 - z 2 ) 2 _ 
where (resp. Qg) is a closed counterclockwise oriented contour which only contains the k-th cluster 
(resp. £-th) . 

The proof of this theorem is based on ll20ll and the continuous mapping theorem. Details are available 
in ED. 

The main objective of this article is to provide estimators for the /Vs without relying anymore on the 
separability condition (i.e. to remove Assumption [4]). A Central Limit Theorem will be established as 
well for the proposed estimator. As a by-product, the knowledge of the multiplicities will no longer be 
needed, and they will be estimated as well. 

III. Estimation of the eigenvalues pi 

In this section, we provide a method to estimate consistently the eigenvalues of the population co- 
variance matrix without the need to the separability condition (cf. Fig. [2]). Our method is based on the 
asymptotic evaluation of the moments of the eigenvalues of Rjy, 7? = Sfe=i Tv /°1> 1 ^ * ^ — 1. If 
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{fhi)l<i<2L-i are the empirical moments of the sample eigenvalues, then it is well known that except 
for i = 1, 7j cannot be approximated by m». Consistent estimators for ji are provided in IPT51 , where it 
has been proved that: 



where 



7i - 7i > 0) 

N,M^+oo 



7t = ^2^s(l,i)fhi, 
i=i 



Hs(l, i) being some given coefficients that depend on the system dimensions and on the empirical moments 
fhi lfT3Tl . An alternative is to use the Stieltjes transform: 

Lemma 1: Assume that Assumptions [T] [2] and [3] hold true. Let 73 be the real quantities given by: 

7o = 1, 



< 



zm'- (z) 



71 = -2fkh mf(z) dz > 



% = ^^k^tvy for2<fc<2L-l 
where C is a counterclockwise oriented contour which encloses the support S of the limiting distribution 
of the eigenvalues of R N . Let 7$ be the moments of the eigenvalues of Rjy, i-e. ji = Ylk=l TfPk- Then, 
for 1 < i < 2L - 1, 

a.s. „ 

7i - 7t ^ • 

N,M-^oo 

The proof of this lemma is postponed to Appendix [Aj While the estimates proposed by |[T5l are better 
in practice, estimates will be of interest in order to establish the central limit theorem, and to obtain 
a closed-form expression of the asymptotic variance. 

An interesting remark is that the map that links the eigenvalues and their multiplicities to their first 
2L — 1 moments is invertible. Retrieving the eigenvalues from the estimates of the 2L — 1 moments is 
thus possible. This is the basic idea on which our method is founded. 

The main result is stated as below: 

Theorem 4: Recall the notations of Lemma [T] and consider the system of equations: 

Si=l x i = 1> 

Y%=lX i y i = j 1 , (9) 
Ef=i XiVi = % for 2 < k < 2L - 1, 
where (xj)i<i<z and {yi)i<i<L are 2L unknown parameters. Then under Assumptions [TJ [2j [5] the 
system of equations ^ has one and only one real solution (ci, • • • , C£, pi, • • • , p^,) with pi < • • • < pl. 
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Moreover, (ci, • • • , cl, pi, ■ ■ ■ , pi) is a consistent estimator of (ci, • • • , c_l, pi, ■ • ■ , pi), ie., 



dp — cp — — — > and be — pp — — — > 0, 



with c t = lim ^ for 1 < i < L. 

Remark 7: The condition of separability is not required in the previous theorem. Moreover, the mul- 
tiplicities are assumed to be unknown and thus have to be estimated. Fig [2] represents a case where the 
three clusters are merged into one cluster. In such a situation, the estimator in is biased whereas the 
proposed one is asymptotically consistent. 

Remark 8: We use the estimator proposed in Lemma [T] However, the proof below does not depend 
on the estimator of the moments we choose. In fact, for any consistent estimator of the moments j{, the 
above theorem always holds true. 

Proof: The proof can be split into two main steps. By using the inverse function theorem, we can 
prove the almost sure existence of a real solution. Then, the uniqueness is ensured by a matrix inversion 
argument. 

1) Existence of a real solution of the system. 

The first task is to show that the system of equations ([9]) admits, for N sufficiently large, one real 
solution (ci, • • • , C£, pi ■ ■ ■ , pl) satisfying pi < f>2 < • • • < PL- We shall also establish the consistency 
of the obtained solution. The proof of the existence of a real solution follows in the same way as in 0. 
It is merely based on the use of the inverse function theorem which ensures the existence as soon as 
the Jacobian matrix of the considered transformation is invertible. We recall below the inverse function 
theorem II2D : 

Theorem 5: |[2T1 Let / : R n — > M. n be a continuously differentiable function. Let a and b be vectors 
of M n such that /(a) = b. If the Jacobian of / at a is invertible, then there exists a neighborhood U 
containing a such that f : U — > f(U) is a diffeomorphism, i.e, for every y 6 f(U) there exists a unique 
x such that /(x) = y. In particular, / is invertible in U. 

Consider the functional / defined as: 



f(xi,--- ,x L ,yi,--- ,y L ) 



' L L L \ 

x e=i i=i i=i J 
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Consider z = (x±, ■ ■ ■ , xl, yi, ■ ■ ■ , Vl) and denote by c = (c\, ■ ■ ■ , C£,pi, ■ ■ ■ > Pl)', we then have: 

1 ... 1 o ••• 

Pl • • • PL Ci • • • CL 



M 



9/ 



• pf- 1 (2L-l)c lP f- 2 ••• (2L-l)c LP f- 2 
We will show that M is invertible by contradiction. Assume that M is singular. Then, there exists a non 
null vector A = [Ai, - - - , \2l] T such that M T A = 0. Consider the polynomial 



2L-1 



p(x) = x: Ai+ix*. 

We easily observe that M T A = implies that 

P(p e ) = P'{p e ) = , for 1 < £ < L . 

In particular, the multiplicity of each pe is at least 2. This is impossible since the degree of P is at most 
2L — 1 (recall that all the eigenvalues p£ are pairwise distinct). Matrix M is therefore invertible. The 
inverse function theorem then applies. Denote by tpi = J2k=i c kPk for < i < 2L — 1. There exists a 
neighborhood U of (ci, ■ ■ ■ , cl, pi, ■ ■ ■ , Pl) and a neighborhood V of (V>o, ■ • • , 4>2L-i) such that / is a 
diffeomorphism from U onto V. On the other hand, we have: 

a.s. „ 

a - a — > o. 

As 7j — ipi — > 0, therefore, almost surely, (%,■■■ ,721,-1) G for iV and M large enough. Hence, a 
real solution 

(ci,-- - ,c L ,pi,--- ,p L ) = / _1 (7o,-- - ,72L-i) e ?7 



exists. And by the continuity, one can get easily that: 



c-i - eg 



->• and - ^ 



2) Uniqueness of the solution of the system. 
Consider the polynomial Q with degree L defined as: 



AT M-s.00 



->• for 1 < £ < L . 



L L 

A 



where sl = 1. Denote by s = [sq, • • • , sl-i] T - It is clear that g : (pi, 



, pl) — > s is a homeomorphism. 



It remains thus to show that vector s is uniquely determined by (70, ■ ■ ■ ,j2L-i) 
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It is clear that each pk is also the zero of the polynomial functions R^(-X~) given 

L 



i=Q 



where < £ < L — 1. In other words, for 1 < k < L, we get: 



or equivalently: 



Summing (10 1 over k, we obtain: 



i=0 
L 



22 ii+t s i = o , 



i=0 



for < £ < L — 1. Since = 1, ( [TT] ) becomes: 

L 

7L + ^ + Si7 i+ ^ = , 

for < £ < L - 1. 



L-l 



i=0 



Writing ( [12] ) in a matrix form, we get: Ts = — b, where 

70 71 • • • 7L-1 

71 72 • • • 7L 



and b 



7l 



72L-1 



7L-1 1L • • • 72L-2 

On the other hand, we have Y = ADA T , where D = diag(ci, C2, • • • , c£) and 

1 1 ••• 1 
h f>2 ■■■ PL 



Pt 1 Pt 1 



fr 1 



Then, 



det(r) = Yl c k [J {pi- pj) 2 > 0. 

k=l l<i<j<L 

Therefore, the vector s is then uniquely determined by T and b and is given by: 



s = -r _1 b. 



Hence the unicity. The proof is complete. 
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IV. Fluctuations of the estimator 

In this section, we shall study the fluctuations of the multiplicities and eigenvalues estimators (ci, • • • ,&L,pi, 
introduced in Theorem [4] In particular, we establish a central limit theorem for the whole vector in the 
case where the entries of matrix Xjy are Gaussian. 

Theorem 6: Let Assumptions [T] [2j 3b hold true. Let (c\, ■ ■ ■ , cl, pi, ■ ■ ■ , Pl) be the estimators obtained 
in Theorem [4) Then 



M 



N 



N L 



Cl ~ —, ■■■ ,C L ~ -jy'Pl ~ PU" ■ ,PL~ PL 



■D 



>X 2L (0,G) 



where is a 2L x 2L matrix admitting the decomposition = M 1 WM lT with 

1 ... 1 o ••• 

Pl ' ' ' PL Cl • • • CL 



M 



pf- X 



Pl \ lL ~ l ) c ^Pi 



(2L - l)c L p% 



2L-2 



and 



W 




V 



where V is a (2L — 1) x (2L — 1) matrix whose entries are given by (for 1 < k, £ < 2L — 1): 

(-l) k+e f r ( m'(z 1 )rn'(z 2 ) 1 \ 1 

'd Je 2 \(m(zi) -m(z 2 )) 2 (zi - z 2 ) 2 



Vi 



kl 



-dz\dz 2 



4ir 2 c 2 Jp, Jp„ \ (m(z\ ) — m(z 2 )) 2 (zi — z 2 ) 2 ) m k (zi)m l (z 2 ) 
where Ci and Q 2 are two closed contours non-overlapping which contain the support S of F and are 

counterclockwise oriented. 

Proof: The proof relies on the same techniques used in fl9ll . We outline hereafter the main steps 
and provide then the details. 



, pl) verifies the following system of equations: 



By Theorem |4j the estimate vector (ci, • • • ,cl,Pi, 

, Eti = % for 2 < k < 2L - 1, 
where the 7j's are the moment estimates provided by Lemma [T] 

Using the integral representation of Ylu=i c iPi an( ^ E^=i c iPi Formula we get: 

' EiiM(c,-f)=0, 



Ef=i M (^-fft) 



2Niw Je * I m &N (z) 



Zt=iM{cip k 



N Pi) 



2Nin 

M 2 (-l) k r 

2i(fc-l)Af7r JC \m A (z) 



m N (z) 



1 



dz. 



i 



m N (z) k 



dz, 2 < k < 2L - 1. 
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Denote by C(G, C) the set of continuous functions from C to C endowed with the supremum norm 
IMloo = sup e \u\. In the same way as in |[T9l , consider the process: (Xn,X' n ,un,u' n ) : C — > C, where 

X N (z) = M (m^ (z) - m N (z) \ , 
X' N (z)=M(m!^(z)-m! N (z)), 
u N (z) = m-r, (z), u' N {z) = mL (z). 

—N =^-N 

Then, M^2f =1 (cipi — jj-pi) can be written as: 



L 



i=l 



Ni \ Mr fm N {z)X' N {z)-u' N (z)X N (z) 



iV y 2iA^7r Jg \ m N (z)uN(z) 
Tn(Xn, X n , UN, u N ), 



„ , , M f ( m N (z)x' '(z) — u' (z)x(z) , , 



where 



2iiV7r J e \ THn{ z ) u { z ) 
On the other hand, using the decomposition a k — b k = (a — b) Yli=o <^6 fc_1 ~^, we can prove that: 
V-^^-fc N i k\ M 2 (-l) k m A (z) - m N {z) 



2iiV7r(fc-l) J e £^ ml +1 (*)z4~ 1_ ^) 
M(-l) fc+1 



fc-2 



A 



2 ^ 



for 2 < fc < 2L - 1, where 



^ X ^= 2iN(k-l)J X x{z)u{z) 1 



The main idea of the proof of the theorem lies in the following steps: 

1) Prove the convergence of [Tn(Xn,X' n , un,u' n ), &n,2(Xn, un), • • • > $n,l(Xn, un)] T to a Gaus- 
sian random vector with the help of the continuous mapping theorem. 

2) Compute the limiting covariance between {p%p\ — wPi) anc * ^^2i=i {^iPi ~ wPi)- 

3) Conclude by expressing M \t\ — ■ • ■ , cl — j^, p\ — pi, ■ ■ ■ ,Pl~ Pl] T as a linear function of 

M [70 - 70, • • ■ , 72L-1 - 72L-i] T - 
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A. Fluctuations of the moments 

The convergence of Y^PC/v, X' N , u/y, u' N ) to a Gaussian random variable has been established in 
lfl9l . It has been proved that: 

T N (X N ,X' N ,u N ,u' N ) - > T(X,Y,m,m) 

M.N— s>oo 



where 

T(nr. 11 71 m\ = 

2mc J e \ m(z)v(z) 
and (X, Y) is a Gaussian process with mean function zero and covariance function given by: 



1 f ( m(z)y(z) - w{z)x(z)\ 

T(x,y,v,w) = — — <p z I r^r~r~\ " ,z - 



mf (z)mf (z) 
(m(z) — m(z))' 2 {z — z) 



C0V(X(Z),X(Z)) = -j—^ ^2 - — = K ( Z ' Z )' 



d 

cov (Y(z),X(z)) = —k(z,z), 
oz 

d 

cov (X(z),Y(z)) = —k(z,z), 

cov(Y(z),Y(z)) = J^-k(z,z). 

ozoz 

We also need to prove the convergence in distribution of <J>jv,fc(-Xjv> un), for 2 < k < L. The cornerstone 
of the proof is the convergence of X^ : C — > C to a Gaussian process X(z) which is ensured in ll22l 
Lemma 9.111. Since un > m, (Xn,un) converges in distribution to (X, m). 

N,M-++oc ~ ~ 



Let $fc(x, u) be defined as: 



$k(x,u) = . & x(z)u(z)~ k dz. 

2lC7T ' 



e 



We want to show that <£fc(Xjv, un) converges in distribution to a Gaussian vector. The continuous 
mapping theorem is useful to transform one convergence to another. 

Proposition 1 (cf. /l23l Th. 4.27]): For any metric spaces Si and S2, let £, (£ n ) n >i be random elements 

D 

in Si with £ n > £ and consider some measurable mappings /, (fn)n>i- Si — y S2 and a measurable 

n— >oo 

set rcSi with (GT a.s. such that f n {s n ) ->■ f(s) as s n ->■ s G T. Then / n (£ n ) 33 > /(£). 



Consider the set: 



(a, it) G C 2 (C,C),inf |u| > 

e 



Then, since infg \rn\ > (see [22, Section 9.12]), the dominated convergence theorem implies that the 
convergence of (xn,Vn) — > (x,y) G V leads to ^N,k{xN,UN) — > &k( x ,y)- The continuous mapping 
theorem applies, thus giving: 

®nA X N,un) > $k(X,u). 

M,N— >oo 



January 24, 2012 



DRAFT 



16 

It now remains to prove that the limit law <J>fc (X, u) is Gaussian. For that, it suffices to notice that the 
integral can be written as the limit of a finite Riemann sum and that a finite Riemann sum of the elements 
of a Gaussian random vector is still Gaussian. 

The convergence of Tn(Xn, X' n , un, u' n ) and $jv,jfc(^iV) un) is not sufficient to conclude about that 
of the whole vector. The additional requirement is to prove the convergence to a Gaussian distribution of 
any linear combination of [T N (X N , X' N , itjy, u' N ), ^n^ 2 (X n , u n ), ■ ■ • , & N L (X N , un)] T , which can be 
easily established in the same way as before. It implies that this vector converges to a Gaussian vector. 
This ends the first step of the proof. 



B. Computation of the variance 

We now come to the second step. We shall therefore evaluate the quantities: 

V M =E[T(X,Y,m,m)T(X,Y,m,m)] , 

Vi, fc = V M = E [T(X, Y, m, m')**(*> ™)] > 2 <k < L, 

V M = E [* k (X, m)$ e (X, m)}, 2 < k,£ < 2L - 1. 



The details of the calculations are in Appendix IB] and yield: 



V 



-1 



\k+i 



4tt 2 c 2 



1 



m' (zi)mf (Z2) 
(m(zi) -m{z 2 )f {z\-z 2 f 



m k (zi)m e (z 2 ) 



dz\dz 2 



for 1 < k, £ < 2L - 1. Let w M = M [% - 70, • • • , 72L-1 - 72L-1] 
We have just proved that vector wm converges asymptotically to: 

•n 



N,M->+oo 



>N 2L (0,W), 



where 



W 




V 



and V is the (2L — 1) x (2L — 1) matrix whose entries T4,z are given by §13\ . 
Remark 9: The zeros in the variance are simply from the fact that 70 — 70 = 0. 



(13) 



N ' 



) CL - ^TiPl - Pi, ■ ■ ■ ,PL~ PLY » we sha11 



C. Fluctuations of the eigenvalues estimates 

To transfer this convergence to qM — M \c\ 
use Slutsky's lemma which is as below: 

Lemma 2 (cf. I[24\l): Let X n , Y n be sequences of vector or matrix random elements. If X n converges 
in distribution to a random element X, and Y n converges in probability to a constant C, then 
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Y- x X n ^ C X X 



provided that C is invertible. 

We will show that satisfies the following linear system: 



wm = M A/ q/i/ 



(14) 



where M.m converges in probability to M which is given by 



M 



1 

Pi 



1 

PL 







pf" 1 







• pf^ (2L-l)c lP f- 2 ••• (2L - l) CL ff ~\ 

To this end, let us work out the expression of w^m, the k-th element of wjj, 
If = 1, it is easy to see that w\^m = 0. 
For k > 2, Wk,M is given by: 

L 



L 



N, 



i fc-l 



/V' 



i=i 

L 



M 2^ ( c iPi -—Pi + —p. 



fc-1 fc-1 



AT' 



TV' 



i=i 



N 



N 

k-2 



Pi 



e=o 



Then define 



/ 1 

Pi 



-2L-1 



1 

PL 



Ni 
N 



Nl 
N 



. . . -2L-1 Ni y-2L-2 # 2L-2-1 ... Nt V-2L-2 21-2- . 



We can see easily that the equation ( |14[ ) is satisfied and Mm converges in probability to M. It remains 
to check that M is invertible. Note that the non- singularity of matrix M has been already established in 
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Section [TTTJ where this property was required to prove the existence of an estimator. As a consequence, 
using Slutsky's lemma, we deduce that: 

M M q M ► X 2L (0, W) 

M,N->+oc 

and 



qM > ^2L (0,M- 1 W(M- 1 ) T ) 



This ends the proof for the fluctuation. 



V. Simulations 

In this section, we compare the performance of the proposed method with that of Mestre's estimator 
in Q. We also verify by simulations the accuracy of the Gaussian approximation stated by the Central 
Limit theorem. 

In the first experiment, we consider a covariance matrix R^r with three different eigenvalues (pi, p%, pz) = 
(1,3, 5) uniformly distributed i.e, ^ = ^ = ^ = |. We set the ratio between the number of samples 
and the number of variables jj to 3/8, a situation for which the separability does not obviously hold 
(see Fig. [2]). Since the knowledge of the multiplicities is available when using the estimator in Q, we 
assume, the same for the proposed method. Hence, the estimation of the polynomial whose roots are pi 
could not be as described previously. It is actually performed using the Newton-Girard formulas, which 
relates the coefficients of a polynomial to the power sum of its roots. 

We compare the performance of both estimators for different values of M and N satisfying a constant 
ratio c = N/M = 3/8. Fig [3} the experienced mean square error (MSE) in the estimation process for 
each method, where the MSE is given by: 

3 

MSE = 53|ft- Pi | 2 . 

i=i 
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Fig. 3. Experienced MSE with N when § = | and Fi g- 4 - Experienced MSE with TV when § = § and 

( P x,P2,p 3 ) = (1,3,5) (Pi,p2,pa) = (1,1.5,2) 

We note that as M and N increase, the estimator in [7] exhibits an error floor since the separability 
condition is not satisfied and thus is no longer consistent. We also conduct the same experiment when 
px, p2 and p$ are set respectively to 1,1.5 and 2. We note that in this case, the asymptotic gap with 
Mestre's estimator is further large (See Fig. 4). 

In the second experiment, we verify by simulations the accuracy of the Gaussian approximation. We 
consider the case where there are two different eigenvalues pi = 1 and p 2 = 3 that are uniformly 
distributed. Unlike the first experiment, we assume that the multiplicities are not knwon. We represent 
in Fig [5] the histogram for p\ and p 2 when N = 60 and M = 120. We also represent in red line, the 
corresponding Gaussian distribution. We note that as it was predicted by our derived results, the histogram 
is similar to that of a Gaussian random variable. 



VI. Discussion 

The present work is a theoretical contribution to the important problem of estimating the covariance 
matrices of large dimensional data. Two important assumptions (separability condition, exact knowledge 
of the multiplicity) have been in particular relaxed with respect to previous work. From a numerical 
point of view, it should be noticed however, that the situation is more contrasted: If the multiplicities are 
known, previous simulations show good performance; if not, then one needs to enlarge the dimension of 
the observations to achieve a good performance. Moreover, if the eigenvalues of R^r are far away from 
each other, then only the largest eigenvalue is well-estimated because in the expression of the moments, 
the term corresponding to the largest eigenvalue prevails. On the other hand, if the eigenvalues are too 
close to each other, matrix T is ill-conditioned, thus enlarging the induced error. These phenomenas are 
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Estimates 



Fig. 5. Comparison of empirical against theoretical variances for Ci = C2 = 0.5 and pi = 1 and p2 — 3 

inherent to the moment method, and preliminary studies show that using trigonometric moments might 
help mitigating these numerical problems. 



Appendix A 
Proof of lemma Q] 



By Cauchy's formula, write: 



-duj, 



k=l JL r=l rr 

where T is a counterclockwise oriented contour that circles all eigenvalues {pi, • • • ,/>£,}. Performing the 
changing variable oj = in the same manner as in Q, we get: 

~N 



2^ N Pk 9.WM ?„2^^e+i 



N r mf N (z)dz 



2ittN /e^^W^W + l) 
where the contour C is counterclockwise oriented which contains the whole support S. 
From Q, we can establish that: 



1 L 



N r 



> 1 + p r m N (z) ' 



thus yielding: 



fe=i 



N 



Pk 



2i7T J en /+\z) 



e+if m N (z)dz. 



(15) 
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Plugging the relation: 



, . M . . M(l - 
m N {z) = -j^m N (z) + — 



AT N 

M / 



into ( pT5] >, we obtain: 

L 



k=l 



2ivr J e ATm^(z) 2ivr / e iVm^+^z) 



(-1)' I Mzm' N (z)dz + (-If ^ M(l-f)m^(z) ^ 



(16) 



Since ^i+ifl is the derivative of — }, 



cfe = 0. 



The second term on the right hand side of ( fT6] ) is then equal to zero. It remains thus to deal with 
r z ™n( z ) _ if £ > 2, by integration by parts, we obtain: 



We thus obtain: 



zrn' N (z) 1 / dz 



A iV fe , _ M(-l) 



(17) 



AT™ 2i7riV(£-l) Jem^ 1 ^)' 
Finally, we propose to substitute the unknown term m N (z) by its asymptotic equivalent rrif. (z). Let 

— V 

7o> • ■ ■ > I2L-1 the real quantities given by: 

= 1, 



7o 
7i 



2iVi7r JC m ft (z) 



M(-l) 2 



72L-1 - 2N{2L-l)m §d m^-Uz)- 

Then, by the dominated convergence theorem and the fact that with probability one 



Section 9.12], 



and 



one obtains: for all k > 2, 



and 



Consequently: 



inf \m N (z)\ > 
zee 



inf \rrif. (z)\ > 0, 



dz 



rnf N (z)dz 



* .Y 

e m R ( z ) 



^ 0. 



7i - 7i 



N,M->oc 
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Appendix B 
Calculation of the variance 

In this section, we will show the calculations of the variance matrix V. The computation of Vi i has 
been carried out in [19] where it was shown that: 



Vi,i 



l 



4ir 2 c 2 



m' (zi)m' \z 2 ) 



1 



1 



m(zi)m(z 2 ) 



dz\dz 2 



t 1 J e 2 - m{z 2 )) 2 {z 1 -z 2 ) 2 _ 

with Si and S 2 defined in the theorem. Using the fact that inf^gg |zZl(2)| > together with Fubini's 

theorem, the quantity for k > 2, £ > 2, becomes: 

t_l\k+e 



V fc , 



4vr 2 c 2 j Gl je a 
Substituting E [X(zi)X(z 2 )] by k(z±, z 2 ), we obtain: 



E [X{z 1 )X{z 2 )\ rn- k {z l )mr t {z 2 )dzidz 2 . 



V 



-1 



\k+l 



kl 



4ir 2 c 2 



ml \zi)mf (z 2 ) 1 
(m(zi) - m(z 2 )) 2 (zi - z 2 ) 2 



m k (zi)rrr (z 2 ) 



dz\dz 2 



Finally, it remains to compute V^j. Expanding T(X,Y,m,m') and 3>fc(X, m), we obtain: 



V 



■1 



,/c+l 



fc.l 



47T 2 C 2 



Ci >/e 2 



22 



(-1) 



k+l 



m(z 2 )m k (zi) 
z 2 8 2 k(z 1 , z 2 ) 



Att 2 c 2 \J e± 7 e2 m(z 2 )m(zi) 
By integration by parts, we obtain: 

z 2 d 2 n(z 1 ,z 2 ) 



E [X( Zl )X'(z 2 )] dz x dz 2 
-dz\dz 2 



m/(z 2 ) 



m(z 2 ) 2 m k (zi 

rn'(z 2 )n(zi,z 2 ) 
— 2 / \ h ( \ dzidz 2 
e 1 Je 2 m 2 {z 2 )m k {zi) 



-E[X( Zl )X(z 2 )] 



dz\dz 2 



e 2 m(z 2 )m^{z lj 



-dz 2 



k(zi, z 2 ) 
e 2 m(z 2 )m k (z 1 ] 



-dz 2 + 



m!(z 2 )K(zi,z 2 ) 
' e m(z 2 ) 2 m k (z 1 ) 



dz 2 . 



Hence, 



V fc ,i 



(-l) fc+1 f f k(zi, z 2 )dzidz 2 
4vr 2 c 2 / Cl / e2 m(z 2 )m k (zi) 



This extends the expression of / for any k,£ E {1, • • ■ , L — 1}, thus yielding: 



V 



kj 



-1) 



47T 2 C 2 



d Je 



m' (zi)mf (z 2 ) 1 
(m(zi) -m(z 2 )) 2 (^i-^) 5 



1 



m k {z\)rn l -(z 2 ) 



dz\dz 2 . 



(18) 
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